Applying weighted network measures to microarray distance matrices
نویسندگان
چکیده
In recent work we presented a new approach to the analysis of weighted networks, by providing a straightforward generalization of any network measure defined on unweighted networks. This approach is based on the translation of a weighted network into an ensemble of edges, and is particularly suited to the analysis of fully connected weighted networks. Here we apply our method to several such networks including distance matrices, and show that the clustering coefficient, constructed by using the ensemble approach, provides meaningful insights into the systems studied. In the particular case of two data sets from microarray experiments the clustering coefficient identifies a number of biologically significant genes, outperforming existing identification approaches. Applying weighted network measures to microarray distance matrices 2 The rise of information technology and the internet, as well as the more recent advent of high-throughput technologies in biology make it easier to obtain large amounts of data on complex networks. Increasingly this also includes data on weighted complex networks, which now appear in many different guises: Transport and traffic [1, 2], trade or communication networks, financial networks [3], and collaboration networks [4], to name a few. In biology, genetic regulation and transcription [5] and protein interaction [6] have been studied in this context. However, the extraction of meaningful physical or biological information from these networks is a difficult task. For unweighted complex networks, with binary adjacency matrices, a set of local and global measures on the network has been defined [7], including the degree of a node, its average nearest-neighbour degree [8] and its clustering coefficient [9]. Defining these measures for weighted networks is more difficult and has been the subject of recent research [2, 5, 10, 11]. A review of definitions of weighted clustering coefficients can be found in [12]. In a recent paper [13] we introduced a new approach to this problem which allows for a straightforward generalization of any measure defined on an unweighted network to weighted networks. Here we apply the clustering coefficient defined in this way to distance matrices, which are fully connected weighted networks. The distance matrices are generated from microarray expression series, so that closely related series (by some chosen similarity measure) will be separated by a short distance, which in the network picture translates into an edge with a large weight. The basis of our approach is to find a continuous bijective map M : R → [0, 1] from the real numbers to the interval between 0 and 1, which maps the weights wij ∈ R to a quantity pij ∈ [0, 1]. A simple example of such a map is a linear normalization of the weights: pij = wij −min(wij) max(wij)−min(wij) (1) This simple normalization maps min(wij) to zero. While this is often acceptable in the case of a distance matrix, one should make a more sophisticated choice of map if there are many edges with weight min(wij). Similarly, if the network has negative weights as well as positive ones, the normalized modulus of the original weights might be a more appropriate choice. A more detailed discussion on the topic of map choice can be found in [13]. The ideas we introduce in [13] are based on an interpretation of the matrix P with entries {pij} as a matrix of probabilities. These probabilities can be interpreted as an ensemble of edges, or more concisely, an ensemble network. Thus, just as any binary square matrix can be understood as an unweighted network and any real square matrix corresponds to a weighted network, any square matrix with entries between 0 and 1 corresponds to an ensemble network. If we sample each edge of the ensemble network exactly once, we obtain an unweighted network which we term a realization of the ensemble network. In particular, pij is the probability that the edge between nodes i and j exists. These concepts are valid both for directed networks, with any pij ∈ [0, 1], and undirected networks, for which pij = pji, so that the matrix is symmetric. In a real-world weighted network, the original weights can represent almost any physical quantity, such as the strength of a collaboration between two scientists, or the number of passengers traveling between two countries. By mapping these weights to probabilities we rid ourselves of the interpretational burden of these weights, whilst retaining all the topological information they contain. It should be Applying weighted network measures to microarray distance matrices 3 noted that in many cases the interpretation of weights as probabilities also makes intuitive physical sense. Whenever the weights in a network represent a magnitude of flow, this can be interpreted directly in terms of the probability that a transfer occurs during a given unit of time. Examples include traffic and transport networks as well as communication networks, where we have units (passengers, money, signals) which form an edge, through their transfer, with a probability proportional to the flow rate. All measures on unweighted networks can be written as functions of the entries aij of an adjacency matrix A. In fact, generally they can be written as a polynomial of these entries, or a simple ratio of such polynomials. Note that, for an unweighted network, aij = a m ij for all positive integers m > 0, so that these polynomials are of first order only. Consider a general first-order polynomial, which can be written fully expanded as:
منابع مشابه
First Name Last Name Title
Applying weighted network measures to distance matrices Many approaches to the analysis of weighted networks are not designed for fully connected weighted networks. However, as any distance matrix between objects is a fully connected weighted network, such networks are extremely common. In earlier work we derived an approach for the analysis of weighted networks which also works on fully connec...
متن کاملInverse Maximum Dynamic Flow Problem under the Sum-Type Weighted Hamming Distance
Inverse maximum flow (IMDF), is among the most important problems in the field ofdynamic network flow, which has been considered the Euclidean norms measure in previousresearches. However, recent studies have mainly focused on the inverse problems under theHamming distance measure due to their practical and important applications. In this paper,we studies a general approach for handling the inv...
متن کاملEnsemble approach to the analysis of weighted networks.
We present an approach to the analysis of weighted networks, by providing a straightforward generalization of any network measure defined on unweighted networks, such as the average degree of the nearest neighbors, the clustering coefficient, the "betweenness," the distance between two nodes, and the diameter of a network. All these measures are well established for unweighted networks but have...
متن کاملDifferent Network Performance Measures in a Multi-Objective Traffic Assignment Problem
Traffic assignment algorithms are used to determine possible use of paths between origin-destination pairs and predict traffic flow in network links. One of the main deficiencies of ordinary traffic assignment methods is that in most of them one measure (mostly travel time) is usually included in objective function and other effective performance measures in traffic assignment are not considere...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کامل